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Abstract —  The  paper  presents  an  extension  of  cost-cumulant 
control  theory  over  a  finite  horizon  for  a  class  of  two-team 
pursuit-evasion  games  wherein  the  evolution  of  the  states  of 
the  game  in  response  to  decision  strategies  selected  by  pursuit 
and  evasion  teams  from  non-inferior  sets  of  admissible  controls 
is  described  by  stochastic  linear  differential  equation  and 
integral  quadratic  cost.  Since  the  sum  of  the  aggregate  cost 
functions  of  two  teams  is  equal  to  zero,  the  amount  that  one 
team  gains  is  equal  to  the  amount  of  the  other  team  loses. 
Both  cooperation  within  each  team  and  competition  between 
the  teams  presumably  exist.  A  direct  dynamic  programming 
approach  for  the  Mayer  optimization  problem  is  used  to  solve 
for  a  multi-cumulant  and  non-inferior  based  solution  when  the 
members  in  each  team  measure  the  states  and  minimize  the  first 
k  cumulants  of  the  standard  integral-quadratic  cost  associated 
with  this  special  class  of  multi-player  pursuit-evasion  games. 

I.  Introduction 

Since  the  1950s,  the  work  of  Issacs  [2]  in  deterministic 
pursuit-evasion  game  of  a  single  pursuer  and  a  single  evader 
with  perfect  information  and  common  knowledge  has  been 
greatly  extended  to  pursuit-evasion  with  multiple  pursuers 
and  multiple  evaders.  Recent  developments  [6],  [4]  and 
references  therein  respectively  treat  probabilistic  discrete¬ 
time  as  well  deterministic  continuous-time  problems.  To  the 
best  knowledge  of  the  authors,  there  hasn’t  yet  been  any 
work  done  for  multi-player  pursuit-evasion  differential  game 
problems  wherein  the  members  in  each  team  have  common 
interests  to  statistically  improve  their  payoffs  at  the  expenses 
of  the  other  members  from  the  rival  team.  In  particular, 
this  paper  is  proposing  a  novel  and  innovative  paradigm  for 
non-inferior  strategy  selection  using  performance-measure 
statistics  to  provide  not  only  a  mechanism  in  which  the 
common  benefits  of  all  members  in  each  team  can  be 
optimized,  but  also  an  analytical  tool  which  is  used  to 
characterize  a  complete  statistical  description  of  the  global 
performance  of  the  multi-player  pursuit-evasion.  The  present 
work  has  extensive  applications  in  multi-missile  guidance 
and  interception,  military  tactics,  and  strategic  decision¬ 
making. 

The  paper  is  structured  as  follows.  The  necessary  back¬ 
ground  in  generating  higher-order  performance-measure 
statistics  of  the  multi-player  pursuit-evasion  game  is  pre¬ 
sented  in  Section  II.  These  performance-measure  statistics 
are  then  used  to  formulate  the  cost-cumulant  control  problem 
for  the  subject  game.  A  precise  mathematical  formulation 
along  with  several  problem  statements  of  the  multi-player 
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pursuit-evasion  problem  is  summarized  in  Section  III.  Fi¬ 
nally,  a  multi-cumulant,  non-inferior  and  saddle-point  solu¬ 
tion  and  some  remarks  are  presented  in  Sections  IV  and  V. 

II.  Problem  Formulation 
For  analytical  tractability,  let’s  consider  a  special  class 
of  differential  games  whose  dynamical  systems  of  pursuers 
and  evaders  are  linear  and  the  cost  functions  are  quadratic 
functions  of  the  states  and  controls.  For  instance,  a  pursuit- 
evasion  differential  game  with  a  team  P  with  mp  pursuers, 
identified  as  mi, . . .  ,m.p,  and  a  team  E  with  rrip  evaders, 
identified  as  mi, . . . ,  nip,  in  an  open  subset  of  Hilbert  space 
S.  Denote  by  xf(t)  =  xf(t,ujf)  :  [to,tf]  x  £lf  i— > 
belonging  to  the  Hilbert  space  L‘Lx(Clf:-,C([to,tf\;M.n^)) 

x  At 

of  -valued,  square  integrable  processes  on  [f0,  tf]  that 
are  adapted  to  the  cr-field  pf  generated  by  wf(t)  with 
E  |  fff  (xf)T (r)xf  <  oo  the  state  variables  for  the 

members  i  =  1  in  each  team  X  =  P,E  whose 

corresponding  physical  positions  in  S  are  described  by 

dxf{t)  =  (Af(t)xf(t)  +  Bx(t)uf  (t))dt 

+  Gf  ( t)dwf  (f) ,  x f  (t0)  =  xf0  (1) 

where  the  initial  states  xf  are  known.  The  input  noises 
wf(t)  —  wf(t,uf)  :  [t0,tf]  x  flf  i— >  are  the 

pf -dimensional  stationary  Wiener  process  defined  with 
{?$}*>  o  being  its  natural  filtration  on  complete  filtered 
probability  spaces  (Clf ,  Px ,  {Pf  }t>0,  Vx)  over  [t0,tf] 
with  the  correlations  of  increments 


E{[wf(r)  -  wf  (0][wf  (r)  -  wf  (£)]T)  =  Wf  |r  -  £|, 

and  continuous-time  coefficients  Af  £  C([fo,  f/];  R"‘Y xn*Y), 
Bf  eC([t0,t/];R""xro"),andGf  e  C([i0, */];»"? x*f). 
In  (1),  uf  £  Ilf  are  the  control  vectors  for  the  members 
in  each  team  where  Uf  €  Lfx  (Qf;  C([to,  f/];  )) 

are  the  sets  of  corresponding  admissible  control  strate¬ 
gies  in  Hilbert  space  of  Rm*  -valued,  square  integrable 
processes  on  [to,tf]  that  are  adapted  to  the  cr-field  pf 
generated  by  uf  (t).  For  simplicity  of  notation,  let  xx  = 
[(xfV,...,(xfxV]T,  ux  4  [(uf)f...,(ufx)T]r, 
AX  =  diag(AY , . . . ,  AfJ,  Bf  =  diag  {Bf, . . . ,  Bfmx), 
and  Gx  =  diag  (Gf , . . .  ,Gfx).  Then,  the  dynamic  equa¬ 
tions  of  multiple  pursuers  and  evaders  can  be  rewritten  in  a 
compact  form  as 


dxx  (f)  =  (Ax  ( t)xx  (f)  +  Bf  ( t)ux  (t))dt 
+  Gx (t)dwx (t) ,  xx(t0)  =  x q 


(2) 


and  the  aggregate  dynamic  equation  of  the  multi-player 
pursuit-evasion  differential  game  is  then  given  by 

dx(t)  =  ( A(t)x(t )  +  Bp(t)up(t )  +  Be  (t)uE  (t))dt 

+  G(t)dw(t) ,  x(t0)  =  x0  (3) 

where  A  =  diag  (AP,AE),  Bp  =  [(Bf)T,0]T, 

Be  4  [0 ,{BE)T]T,  G  =  diag (GP,GE),  x  = 

[(xp)T ,  (xE)T]T ,  dw  =  [(dwp)T ,  (dwE)T]T ,  and  W  = 
diag (Wp, . . . ,  WpP>  WE, . . . ,  Wee).  Since  not  all  evaders 
will  be  captured  at  the  same  time,  the  terminal  time  of  the 
game,  tf  should  be  defined  based  on  the  capture  of  all. 
Definition  1:  Terminal  Time. 

For  any  evaders  {j }JA\ ,  assume  that  there  exists  a  pursuer 
{'<}™  i  engaged  with  at  least  one  evader.  The  capture  time 
tj  of  evader  j  is  given  by 

tj  =  inf  { t  >  0,3*  :  d(xp (t),xE(t))  <  e,  e  G  M+}  .  (4) 

Then,  the  terminal  time  f/  of  the  pursuit-evasion  game  is 

tt  =  max  {tA.  (5) 

i  <j<mB 

Note  that  the  terminal  time  could  be  infinity  due  to  the 
inability  of  pursuers  to  capture  some  evaders  whose  physical 
and  functional  characteristics  are  superior  to  team  P.  Let 
Up  =  UE  =  x^Z/f,  4  x^Rn*P  x 

x  B3  c  S.  Then,  associated  with  each  ( up,uE )  G 
Up  x  UE  is  a  finite-horizon  integral  quadratic  form  (IQF) 
cost  Jx  :  [to,tf]  x  X  xWp  xUE  i— >  R+U{0}  for  which  the 
member  i  in  team  X,  for  X  -  P.  E  attempts  to  optimize 


Ji  (t0,  x0;  w  ,  uE)  =  x1  (tf)Qtfx(tf) 


'to 


xT(t)Q f  (t)x(t)  +  ^2(up )T  (t)Rxp  (t)up (r) 
i=i 

mE 


+ '}2(uf)T(T)R?jE(T)uj' (r) 

j=i 


dr ,  (6) 


subject  to  the  dynamics  of  the  differential  game  (3)  where 


Off 


A  nf +s; 


)x(ES»f+E3»;) 


Q 


X 


C((to,  M;  cross- 

coupling  control  inputs  Rxp  G  C([to,tf]',M.mjxm%)  and 


% 

pmf  xm 


i  )  are  symmetric  and  positive 


semidefinite  with  RxX(t)  and  RfXfit)  invertible. 

Within  a  cooperative  team  X,  it  is  of  interest  a  negotiated 
solution  among  all  members.  A  negotiation  is  done  via 
mutual  and  enforceable  agreements  among  team  members. 
This  solution  is  selected  from  the  set  of  strategy  mx  -tuples 
defined  below. 

Definition  2:  Non-inferior  Strategies. 

The  strategy  mx- tuple  vx  =  (vx,...,vxx)  belongs 

to  the  noninferior  set  if,  for  any  other  strategy  rnx- 
tuple  ux:  {Jx(t0,x0;ux ,■)  <  Jx (t0,  x0;  vx,  •)}  only 
if  {Jx(t0,x0;ux,-)  =  Jx(t0,x0;vx,-)},  for  all  i  = 


1, . . .  ,mx. 

Since  the  IQF  costs  (6)  are  convex  functions  on  a  convex 
set  Up  x  UE  with  convex  constraints  (3),  the  problem  of 


solving  for  a  set  of  non-inferior  strategies  within  each  team 
X  with  a  vector  cost  criterion  is  equivalent  to  the  problem 
of  solving  an  mx  —  1  parameter  family  of  optimal  control 
problems  with  scalar  cost  criteria  [5],  [7],  Each  non-inferior 
strategy  mx -tuple  therefore  minimizes  the  scalar  criterion 

mx 

Jx(t0,x0;up,uE;tx)  ='^2^iJx(t0,x0;up,uE) ,  (7) 

i= 1 


where  the  set  of  team  strategy  profiles  £,  x  G  IT A  is  defined 
as  follows 


Wx  =  G  Rmx  :  tx  =  1;  0  <  g  <  1 


(8) 


i=l 


Let  Qf  4  E Qx  =  Rfp  = 

ES  eRXr.  and  Rxe  4  EIS  &B$E.  Then  the  ag¬ 
gregate  cost  (7)  can  be  written  explicitly  as  follows 


Jx(t0,x0;up,uE;£x)  =  xT(tf)Qxx(tf) 

ftf  _  TYl  p 

+  /  xt(t)Qx(t)x(t) +  ^2(up)t(t)Rxp(t)up(t) 
Jto  j=1 


+ ^2(uf)T(T)RfE(T)uf(T) 


3=1 


dr .  (9) 


For  a  compact  notation,  let  Rxp  =  diag (Rxp,  ■  ■  . ,  Rx p ) , 
and  Rxe  =  diag (RXE,  . . . ,  rXE)-  The  negotiating  cost  (9) 
associated  with  team  X  then  becomes 


J '  (t0,x0;u  ,u  ;£  )=x  (tf)Qfx(tf) 

+  [  \xT  (t)Qx  (t)x(t)  +  (up)t(t)Rxp(t)up(t) 


'to 


+  (uE)  (t)Rxe(t)ue(t) 


dr.  (10) 


In  fact,  the  game  is  zero-sum  only  if  Rpp  =  —REP  =  Rp, 
Ree  =  -Rpe  =  Re,  Qf  =  -Qf  =  Qf,  and  Qp  = 
—QE  =  Q.  Substituting  these  results  into  (10),  one  obtains 
the  zero-sum  differential  game  cost 


J(t0,x0;it  ,uE)  =  x1  (tf)Qfx(tf) 


D  r 


/to 


x1  (r )Q(t )x(t )  +  ( uPY  (t)Rp(t)up(t) 


-  (u  )  (t)R  (t)ue(t) 


dr.  (11) 


In  view  of  the  linear  system  (3)  and  the  quadratic 
performance-measure  (11),  it  is  reasonable  to  assume  that 
both  teams  P  and  E  choose  their  control  actions  from  classes 
of  linear  memoryless-feedback  strategies,  7P  :  [to,tf]  x 
Lft(D-C([t0,tf]-X))  ~  Lft(D-C([t0,tf}-Up))  and  jE  : 
[t0,tf]  x  Lft(h;C([t0,tf];  X))  i— >  Lft(D-,C([t0,tf\-,UE)) 

up(t)  =  7  p(t,  x(t))  =  Iip(t)x(t) ,  (12) 

uE(t)  =  7 E(t,  x(t))  =  KE(t)x(t) ,  (13) 


where  Kp  G  C([t0,  tA  n*  +E'=®  nQ)  and 

KE  G  C([t0,  tf}-R^  -f  x(ES  +£7=1  <))  are  admis¬ 
sible  gains  for  teams  P  and  E.  For  the  given  initial  condition 


(t o,Xo)  £  [t-oAf]  x  <5  and  control  strategies  subject  to  (12)- 
(13),  the  dynamics  of  the  game  (3)  is  then  given  by 

dx(t)  =  [A(f)  +  Bp (t)Kp (t)  +  BE{t)KE{t)\  x{t)dt 

+  G{t)dw{t)  ,  x(t0)  =  x0,  (14) 

and  its  IQF  cost  also  follows 

Bj 

J  {t0,  x0;  Kp,  Ke )  =  xT(tf)Qfx(tf)  +  /  xt(t )  [Q(t) 

Jto 

+KPT{r)Rp{T)Kp{T)-KET{T)RE{T)KE{T)]x{T)dT. 

(15) 

It  is  now  necessary  to  develop  a  procedure  for  generating 
cost  cumulants  for  the  zero-sum  stochastic  differential  game 
by  adapting  the  parametric  method  in  [3]  to  characterize 
a  moment-generating  function.  These  cost  cumulants  are 
then  used  to  form  a  performance  index  in  the  cost-cumulant 
control  optimization.  This  approach  begins  with  a  replace¬ 
ment  of  the  initial  condition  ( to,Xo )  by  any  arbitrary  pair 
(a:,  xn).  Thus,  for  the  given  admissible  feedback  gains  Kp 
and  Ke,  the  cost  functional  (15)  is  seen  as  the  “cost-to-go”, 
J(a,xa).  The  moment-generating  function  of  the  vector¬ 
valued  random  process  (14)  is  given  by 

p{a,xa;9)  =  E{exp{9J(a,xa))}  ,  (16) 

where  the  scalar  6  £  M+  is  a  small  parameter.  Thus,  the 
cumulant-generating  function  immediately  follows 

ip(a,xa\9)  =  In  {ip  {a,  x a-,  9)}  ,  (17) 

in  which  ln{-}  denotes  the  natural  logarithmic  transformation 
of  an  enclosed  entity. 

Theorem  1:  Cost  Cumulant-Generating  Function. 

For  all  a  £  [to,tf]  and  the  small  parameter  9  £  R+,  define 

p{a,xa\9)  =  g(a,  9)  exp  (xpT{a,9)xa)  ,  (18) 

v  (a,  9)  =  ln{p  {a,  0)}  .  (19) 

Then  the  cost  cumulant-generating  function  is  expressed  as 

r/>  (a,  xa\9)  =  x^T{a,  9)xa  +  v  {a,  9)  ,  (20) 

in  which  the  scalar  solution  v  {a,  9)  solves  the  backward-in¬ 
time  differential  equation  with  v(tf,9)  =0 

~^v(a,9)  = -Tr{T (a,  9)G  (a)  WGT  (a)}  ,  (21) 

whereas  T (a,  9)  satisfies  the  backward-in-time  differential 
equation  together  with  T  (tf,9)  =  9Q f 

-  [A{a)  +  Bp{a)Kp{a)  +  BE{a)KE{a))T t(a,  0) 

-  T(a,  9)[A{a)  +  Bp{a)Kp{a)  +  BE{a)KE{a )] 

-  2T{a,9)G{a)WGr{a)T{a,9)  -  9[Q(a) 

+  KPT(a)Rp(a)Kp(a)  -  KET{a)RE{a)KE{a)\ .  (22) 


Meanwhile,  g(a,  9)  satisfies  the  backward-in-time  differen¬ 
tial  equation  with  g(tf,9 )  =  1 

-f- g{a,9 )  =  —g  (a,  9)  Tr  {T(a,  9)G  (a)  WGT  (a)}  . 
da 

(23) 

Proof.  For  any  9  given,  let  w  ( a ,  xa\  9)  =  exp  {9  J  (a,  xa)) 
then  the  moment-generating  function  becomes 

p  {a,  xa\  9)  =  E  {uj  {a,  xa;  0)}  , 

with  the  time  derivative  of 

^ V  {a,  xa\ 9)  =  -ip  {a,  xa ;  9)  9xp  [ Q(a ) 

+  KPT{a)Rp{a)Kp{a)  -  KET(a)RE(a)KE(a) \xa  . 

Using  the  standard  Ito’s  formula,  one  get 

dp  {a,  xa;  9)  =  E  {dvj  {a,  xa ;  0)}  , 

=  E^yCoa  {a,  xa ;  9)  da  +  wXa  {a,  xa\  9)  dxa 

+  ^Tr  {zuXaXa(a,xa;  9)G(a)WGT(a )}  dctj  , 

=  i pa  {a,  xa;  9)  da 

+  pXa{a,  xa ;  9)[A{a)  +  Bp{a)Kp(a)  +  BE(a)KE(a)]xada 
+  ^Tr  \pXaxa  (a,  xa;  9)  G  (a)  WGT  (a)}  da  , 

which  with  the  definition  (18)  leads  to 

-  i p  {a,  xa\ 9)  9xp  [Q(a)  +  KPT(a)Rp(a)Kp(a) 

-  KET {a)RE {a)KE {o)]xa  =  daQ  (y  ’ ^  p  {a,  xa\ 9) 

g{a,9) 

+  p  {a,  xa;  9)  xp-^T{a,  9)xa  +  p  {a,  ay*;  9)  [A(a) 

+  Bp{a)Kp{a )  +  BE(a)KE(a)\  TT(a,  9)xa 
+xlTa{a,9)  [A{a)  +  Bp{a)Kp{a)  +  BE(a)KE(a)\  ay*} 
+  p  {a,  xa\  9)  |2a^T(a,  9)G{a)WGT {a)T {a,  9)xa 
+  Tr  {T(a,  9)G{a)WGT (a)}  }  . 

To  have  constant  and  quadratic  terms  being  independent  of 
xa,  it  requires  that 

=  ~[A{a)  +  Bp{a)Kp{a)  +  BE{a)KE{a)]T T(a,  9) 

-  T(a,  9)[A{a)  +  Bp{a)Kp{a)  +  BE{a)KE{a)\ 

-  2T (a,  9)G  {a)  WGT  {a)  T (a,  9) 

-9[Q{a)  +  KPT{a) Rp{a) Kp{a)  -  KET{a)RE{a)KE{a)\ , 

'!  e (a. 9)  =  —g  {a,  9)  Tr  |T(a,  9)G  (a)  WGT  (a)}  , 


with  the  terminal  conditions  T  =  9Qf  and  g(tf,9)  = 
1.  Finally,  the  remaining  backward-in-time  differential  equa¬ 
tion  satisfied  by  v  (a,  6)  is  given  by 

~^v(a,0)  =  — Tr  {  Y(ct,  9)G  (a)  WGT(a)}  ,v(tf,6)  =  0 

which  completes  the  proof. 

The  MacLaurin  expansion  of  the  cumulant-generating 
function  is  used  to  generate  cost  cumulants  for  the  multi¬ 
player  pursuit-evasion  game 


OO 


Qi  ~  Q(i) 

ip(a,xa;9)  =  '^Kl(a,xa)  —  =Z^g^ip(a,xa-,9) 

i—1  '  i—1  U 


0=0 


(24) 


in  which  Ki(a,xa)’s  are  the  cost  cumulants.  Note  that  the 
series  coefficients  can  be  computed  by  using  (20) 


d  (*) 

qqR)  il>(a,xa-,0) 


0=0 


<9« 

dW) 


T  (a,  9) 


0=0 


<9(0 


v(a,  9) 


(25) 


0=0 


Cost  cumulants  for  the  stochastic  differential  game  problem 
can  be  obtained  using  (24)  and  (25)  as  follows 


i(ct, 


d  «> 


T  (a,  9) 


0=0 


aw 

dW) 


){a,9) 


0=0 

(26) 


for  any  finite  1  <  i  <  oc.  For  notational  convenience,  the 
following  definitions  are  needed  in  place 


H{a’i]-  S)T{a’e) 


0=0 


a  <9(i) 

yu(a,0) 


e=o 

(27) 


Theorem  2:  Cumulants  in  Multi-Player  Pursuit-Evasion. 
Suppose  the  multi-player  pursuit-evasion  game  is  character¬ 
ized  by  (14)-(15)  where  (A,  Bp )  and  (A,  BE )  are  uniformly 
stabilizable.  Two  teams  presumably  choose  their  control 
strategies  (up(t),uE(t))  =  (K p{t)x{t),  I< E(t)x(t)).  For 
given  k  €  Z+,  £p  G  Wp,  and  £p  G  WE ,  the  kth  cost 
cumulant  in  multi-player  pursuit-evasion  is  computed  by 

Kk(to,  x0\  £p,  £,E\ Kp,  Ke)  =  xpH(t0 ,  k)x0+D(t0,  k)  (28) 

in  which  the  cumulant-building  variables  {H(a,  i)}*=1  and 
{D(a,  i)}i=i  evaluated  at  a  =  to  satisfy  the  following 
differential  equations  (with  the  dependence  of  H(a,i )  and 
I)(a.  i)  upon  the  admissible  gains  Kp  and  KE  suppressed) 


iH^ ‘>  = 

-  [A(a)  +  Bp(a)Kp(a)  +  BE(a)KE{a)] 1  H(a,  1) 
-H(a,  1)  [A(a]  +  Bp{a)Kp{a)  +  BE(a)KE(a)\  -Q(a) 
-  Kpr(a)Rp(a)Kp(a )  +  KET (a)RE (a)KE (a) ,  (29) 


and,  for  2  <  i  <  k 

y- H(a,i )  = 

da 

-  [A(a)  +  Bp{a)Kp{a)  +  BE{a)KE{a)\ T  H(a,  i ) 

-  H(a,i)  [A(a)  +  Bp(a)Kp(a)  +  BE{a)I<E{a)] 

^  ^  2?  I 

-  ■ \(j  1  ■y  H(a,j)G{a)WGT{a)H(a,i  -j),  (30) 

together  with  1  <  i  <  k 

=  — Tr  {f7(a,  i)G(a)WGT  (a) }  ,  (31) 

where  the  terminal  conditions  H(tf,  1)  =  Qf,  )  =  0 

for  2  <  i  <  k  and  D(tf ,  *)  =  0  for  1  <  i  <  k. 

Proof.  The  cost  cumulant  expression  in  (28)  is  readily  jus¬ 
tified  by  using  the  result  (26)  and  the  definitions  (27).  What 
remains  is  to  show  that  the  solutions  H(a ,  i )  and  D(a ,  i)  for 
1  <  i  <  k  indeed  satisfy  the  equations  (29)-(31).  Note  that 
the  equations  (29)-(31)  satisfied  by  the  solutions  H(a,  i )  and 
D(a,i )  can  be  obtained  by  repeatedly  taking  the  derivative 
with  respect  to  9  of  the  equations  (21)-(22)  together  with  the 
assumption  A(a)  +  Bp (a)I\p (a)  +  BE(a)KE(a),  stable 
for  all  a  G 

III.  Problem  Statements 

In  the  subsequent  development,  the  subset  of  symmetric 
matrices  of  the  vector  space  of  all  n  x  n  matrices  with  real 
elements  is  denoted  by  §”  where  n  =  nf  +  i  nf  ■ 

Now  let  fc-tuple  variables  H  and  V  be  defined  as  follows 
R(-)  4  («!(■),...,«*(■))  and  V(-)  4  (Vx(-), . . .  ,©*(■)) 
for  each  element  Hi  G  C1([foi f/]; S")  of  H  and  Vi  G 
C1([fo> tf\\ R)  of  V  having  the  representations  Hi(-)  = 
HQ,  i)  and  (Dj(-)  =  D(-,  i)  with  the  right  members  satisfying 
the  dynamic  equations  (29)-(31)  on  the  horizon  [f0,f/].  For 
notational  tractability,  the  following  mappings  are  introduced 

Ti  :  [t0,tf]  x  (§”)fe  x  RmPxn  x  ^  Sn 

Qi  :  [to,tf]  x  (Sn)k  M 

where  inp  =  mf’  mE  —  i  mf>  anc*  the  actions 

are  given  by 

F1(a,H,Kp  ,I<E)  = 

-  [. A(a )  +  Bp{a)Kp{a)  +  BE(a)I<E(a )] T  Hx{a) 

-  Hx(a)  [A(a)  +  Bp(a)Kp(a)  +  BE(a)KE(a )] 

-  Q{a)  -  KFT(a)Rp(a)Kp(a)  +  I<ET (a) RE (a) KE (a), 
Fi{a,H,Kp,KE)  4 

-  [A(a)  +  Bp(a)I<p(a)  +  BE{a)KE{a)]T  Hi(a) 

-  Hi{a)  [A(a)  +  Bp{a)Kp{a)  +  BE(a)KE(a )] 

^  ^  2  7I 

E  -u  _  j{a)G(a)WGT{a)Hi-j{a),  2  <  *  <  jfc, 

J)' 

Qi(a,H)  =  — Tr  {Hi(a)G(a)WGT  (a)}  ,  1  <i<k. 


For  a  compact  formulation,  the  following  product  mappings 
are  introduced 

AlX  •••  X  Tk  :  [to,  t/]  X  (S”)fcX  X  ^  (gn)k 

01 X  •••  xgk:  [to,tf]x  (S")fc^Kfc 

along  with  the  corresponding  notations  T  =  T\  x  •  •  •  x  Tk 
and  Q  =  gx  x  •••  x  Qk .  Thus,  the  dynamic  equations  of 
motion  (29)-(31)  can  be  rewritten  as  follows 

^H(a)  =  f(a,'H(a),Kp(a),KE(a)),  H(tf)  (32) 

^D(a)  =  Q(a,H(a)),  V(tf)  (33) 

where  the  terminal  values  Tt(tf)  =  (Qf,  0, . . . ,  0)  and 
V(tf)  =  (  0,...,0). 

Note  that  the  product  system  uniquely  determines  Tt  and 
V  once  the  admissible  feedback  gains  Kp  and  KE  are  spec¬ 
ified.  Hence,  Ti  and  V  are  considered  as  Tt(-,  Kp ,  KE)  and 
D(-,  Kp ,  Ke),  respectively.  The  performance  index  in  cost- 
cumulant  control  can  now  be  formulated  in  the  admissible 
feedback  gains  K p  and  KE . 

Definition  3:  Performance  Index. 

Fix  k  G  Z+  and  /r  =  {/ tq  >  0}^=1  with  //  j  >  0.  Then  for 
given  (f0,xo),  £p  G  Wp,  and  £p  G  WE,  the  performance 
index  0O  :  [to,tf]  x  (S")fe  x  Rfc  i— >  K+  of  the  cost-cumulant 
control  is  defined  as 

0o  (t0,  H(t0,  I<p ,  KE),V(t0,  Kp,  Ke)) 

k 

±Y,vMkP,kE) 

i= 1 
k 

=  [xo^i(to,  Kp,  K^xo  +  -Ditto,  Kp,  KE)]  (34) 

*= l 

where  the  parametric  design  freedom  (i,  mutually  chosen 
by  two  non-cooperative  teams  represent  different  levels  of 
influence  as  they  deem  important  to  the  overall  cost  distri¬ 
bution  of  the  multi-player  pursuit-evasion  game  and  solutions 
{Hi(t0,Kp,KE)  >  0}*=1  and  {2?i(t0)  Kp ,  KE)  >  0}f=1 
evaluated  at  a  =  to  satisfy  the  equations  (32)-(33). 

For  the  given  terminal  data  ( tf,Hf,Vf ),  the  classes 
/Cp  t,  cp  ce  and  K,e  o,  ~  cp  ce  of  admissible  feed- 
back  gains  may  be  defined  as  follows. 

Definition  4:  Admissible  Feedback  Gain  Strategies. 

Let  the  compact  subsets  KP  C  and  KE  C  MmExra 

be  the  sets  of  allowable  gain  values.  For  given  £ p  € 
Wp,  £E  G  WE,  k  G  Z+,  and  p  =  {p,  >  0}f=1 
with  fix  >  0,  the  sets  of  admissible  control  strategies 
and  ACS,w/lD/;^,£*;/i  are  assumed  to  be 
the  classes  of  C([to,  f/];  KrraJ>xn)  and  C([to,  tf]\ KmE><") 
with  values  Kpf)  G  A'P  and  KE(-)  G  K  for  which 
solutions  to  the  dynamic  equations  of  motion  (32)-(33)  exist 
on  the  finite  horizon  [f0.  tf}. 

Then  one  may  state  the  optimization  problem  for  the  zero- 
sum  stochastic  differential  game. 

Definition  5:  Optimization  Problem. 

Fix  £p  G  Wp,  fE  G  WE,  k  G  Z+,  and  /j  =  {/i,:  >  0}p=1 


with  fix  >  0.  Then  the  optimization  problem  for  multi-player 
pursuit-evasion  over  [to,tf  \  is  given  by 


Kp(-)eK 


mm 


KE(-)elC 


max 

E 


(t0,  H(to,  Kp,  KE),V(to,  Kp,  Ke ))  (35) 


subject  to  the  dynamic  equations  (32)-(33)  for  a  G  [to,tf]. 

It  is  worth  mentioning  that  the  subject  optimization  is  an 
initial  cost  problem,  in  contrast  with  the  more  traditional 
terminal  cost  class  of  investigations.  One  may  address  an 
initial  cost  problem  by  introducing  changes  of  variables 
which  convert  it  to  a  terminal  cost  problem.  However,  this 
modifies  the  natural  context  of  cost  cumulants,  which  it  is 
preferable  to  retain.  Instead,  one  may  take  a  more  direct 
dynamic  programming  approach  to  the  initial  cost  problem. 
Such  an  approach  is  illustrative  of  the  more  general  concept 
of  the  principle  of  optimality,  an  idea  tracing  its  roots  back 
to  the  17th  century. 

As  a  tenet  of  transition  from  the  principle  of  optimality,  a 
family  of  games  based  on  different  starting  points  is  now  of 
concerned.  Let’s  begin  by  considering  an  interlude  of  time,  e 
in  mid-play.  At  its  commencement  the  path  has  reached  some 
definitive  point.  Consider  all  possible  (' H ,  D)  which  may  be 
reached  at  the  end  of  the  interlude  for  all  possible  choices 
of  (Kp ,  Ke).  Suppose  that  for  each  endpoint,  the  game 
beginning  there  has  already  been  solved.  Then  the  value 
function  V(e,  H,  V)  resulting  from  each  choice  of  (Kp ,  KE) 
is  known,  and  they  are  to  be  so  chosen  as  to  render  it 
minimax.  As  the  duration  of  the  interlude  approaches  tf, 
this  leads  to  a  sufficient  condition  to  Hamilton- Jacobi-Isaacs 
(HJI)  equation. 

Definition  6:  Playable  Set. 

Let  the  playable  set  Q  be  defined  as  follows 


Q=  {(e,y,Z)  G  [to,tf  \  x  (S”)fc  x  Rfc  such  that 

K'e,y,Z£p,£E-,n  X  7^  o}’ 

The  fundamental  theorem  of  calculus  and  stochastic  differ¬ 
ential  rules  can  be  used  to  derive  a  saddle  point. 

Theorem  3:  Existence  of  a  Saddle  Point. 

Fix  k  G  Z+  and  ft  =  {/r.j  >  0},f=1  with  /i-\  >  0.  Then  for 
given  (to,Xo),  £p  G  Wp  and  £p  G  WE,  there  exists  a  saddle 
point  (Kp*,Ke*) 
such  that  there  holds 


*)  (Z  rp 


x  1CE 


0o  (to,H(t0,  Kp*,KE),V(t0,  Kp*,Ke)) 

<  0o  (t0,H(t0,Kp*,KE*),D(t0,Kp*,KE*)) 

<  0o  (t0,n(to,Kp,KE*),D(to,Kp,KE*))  . 

Therefore,  the  existence  of  a  saddle  point  yields  both  nec¬ 
essary  and  sufficient  conditions  for  the  minimax  problem  to 
be  equivalent  to  the  corresponding  maximin  problem. 

Theorem  4:  Differentiability  of  Value  Function. 

Let  admissible  feedback  gains  Kp*(a,Tt,V )  and 
KE*(a,Tt,V)  constitute  a  saddle  point.  Further,  let 
to(e,y,Z)  and  (n(to(e,y,Z)-e,y),V(t0(e,y,Z)-e,Z)) 


be  the  initial  time  and  initial  states  for  the  trajectories  of 

4-H(a)  =  T{ol,  H,  I<p*(a,  H,  V),KE*(a,  H,  V)) , 
da 

A-V(a)  =  g(a,H), 

with  the  terminal  condition  (e,y,Z).  Then,  the  value 
function  V(e,y,Z)  is  differentiable  at  each  point 
at  which  to(e,y,Z)  and  y,  Z);  £,  y)  and 

T>(to(e,  y,  Z);  e,  Z)  are  differentiable  with  respect  to 

(s,y,z). 

Moreover,  if  the  value  function  is  continuously  differentiable 
then  such  a  saddle  point  is  unique. 

Theorem  5:  HJI  Equation-Mayer  Problem. 

Let  (e,y,Z)  be  any  interior  point  of  the  playable  set  Q 
at  which  the  value  function  Vie,  V-  Z)  is  differentiable.  If 
there  exist  a  saddle  point  (Kp* ,  KE*)  £  K-Py  z  e,F  e,E-^  x 
ICfyZ.^p  ,  then  the  partial  differential  equation  of  the 
pursuit-evasion  differential  games 


0=  min  max  <  —V(e,y,Z) 
kf&kpkegke  de 


d 


dvtc{y) 


V(e,y,Z)-vec(F(e,y,Kp,KE)) 


d 


d  vec(Z) 

is  satisfied  together  with 


V(e,y,Z)-vec(g(e,y))\  (36) 


V(to,Ho,T>o)  —  </>o(to,Ho,'Do) 

and  vec(-)  the  vectorizing  operator  of  enclosed  entities. 


IV.  Saddle-Point  Strategies 

The  approach  of  obtaining  a  saddle-point  solution  requires 
parametrization  of  the  terminal  time  and  states  of  the  opti¬ 
mization  problem  as  (e,  y,  Z)  rather  than  ( ).  That 
is,  for  e  £  [toitf]  and  1  <  i  <  k,  the  states  of  the  system 
(32)-(33)  defined  on  the  interval  [£o,e]  have  the  terminal 
values  denoted  by  Tt(e)  =  y  and  22(e)  =  Z.  Observe 
that  the  cumulant-based  performance  index  (34)  is  quadratic 
affine  in  terms  of  arbitrarily  fixed  .cq  .  This  suggests  a  solution 
to  the  HJI  equation  (36)  may  be  sought  in  the  form 

W(e,y,Z) 

k  k 

=  Xq  Y^  M  +  £i(e))  x0  +  ^2  Vii^i  +  Pi(£))  5  (37) 

i=l  i- 1 

where  these  parametric  functions  of  time  £.t  £ 

C1([£o>  tf];  Sn)  and  7)  £  C1([£o,  £/];  K)  are  to  be  determined. 

Theorem  6:  Time  Derivative  of  a  Candidate  Function. 

Fix  k  £  Z+  and  let  (e,  y,  Z)  be  any  interior  point  of  the 
reachable  set  Q  at  which  the  real-valued  function  (37)  is 
differentiable.  Then,  the  time  derivative  of  W(e,y,Z)  is 


found  to  be 

±W(£,y,Z)  =  (Gi(e,y)  + 

+  xoY  ^  ^i(e,  y,  Kp,  Ke )  +  x0  .  (38) 

The  substitution  of  this  hypothesized  solution  (37)  into  the 
HJI  equation  (36)  and  making  use  of  the  result  (38)  yield 


d 

0=  min  max  {—Wie.y.Z) 

KFeKpKEeKB  de 


d 


<9vec(X) 

d 


Ovec  (Z) 


W(e,  y,  Z)  ■  vec(^<(e,  y,  Kp ,  KE)) 
W(£,y,Z)-ve  c(ft(e,y))l, 


=  mm  max  <  x, 
KpeKp  KEeKE  I 


it,  Vi^ii5))  xo+J2ni^Ti(e) 
\  2—1  /  2  —  1 


+  xo  {'52t*ifri(e>y,KP>KE)  I  x0  +  ^2fJ>iGi(e,y)  f  • 
\  2—  1  /  2  —  1 

It  is  important  to  observe  that 


(39) 


k 

J2^iXi(e,y,Kp,KE) 

2—1 

k 

=  -  [i4(e)  +  Bp(£)Kp  +  Be(£)Ke]T  VM 

2=1 

k 

-  Y  myt  [A(e)  +  Bp(e)Kp  +  BE{e)KE] 

2=1 

-  /iiQ(e)  -  m KPTRp{e)Kp  +  ihKet RE{e)KE 
2 _ 1 

-  £/*,  E  y,c(sWGT(s)y,-, , 

k  k 

Y  vMe,  y)  =  ~Y  ^Tr  {y,G(eWGT(e)}  . 

i— 1  i= 1 

Differentiating  the  expression  within  the  bracket  of  (39)  with 
respect  to  Kp  and  KE  yield  the  necessary  conditions  for  an 
extremum  of  the  performance  index  (34)  on  [£o,e], 

k 

-2Bpt(£)  Y  aOWo  -  2^Rp{£)KpM0  =  0 , 

2=1 

k 

-2 Bet{£)  Y  o  +  2^Re(£)KeM0  =  0  . 

2=1 

Because  M0  is  an  arbitrary  rank-one  matrix,  it  must  be  true 

k 

Kp(e,y,  Z)  =  -( Rp)-\£)Bpt(£ )  Y  V’-yr  ,  (40) 

r= 1 
k 

KE(£,y,Z)  =  (RE)-\e)BET(£)YTryr, 

r— 1 


(41) 


where  jir  =  fii/n i  for  fj, i  >  0.  Substituting  the  gain 
expressions  (40)  and  (41)  into  the  right  member  of  the  HJI 
equation  (39)  yields  the  value  of  the  minimum 


53  Vi-j-Ziie)  -  aT{£ )  E  ~  E 

.2=  1  £  2—1  2=1 


and,  for  2  <  i  <  k 

—£i(e)  =  ylT(£)7Yi(e)  +  7Y,;(£)vl(£) 
ae 


-  7ii(e)Bp (e)(Rp)  1(e)BPT(s)  53  msW5(£) 


-/ZlQ(£)  +^/lryrBP(£)(f?P)-1(£)BPT(£)^/Ziyi  -  ^/irHr(£)BP(£)(f1-P)-1(£)BPT(£)7fi(£) 


+  5]  M2^SP(£)(f?P)-1(£)BPT(£)  ^  £s34 


+  Hi{e)BE {e){RE)  1(e)BET(e)  53  /xs7fs(£) 


-  53  Mr^5P(£)(f?P)-1(£)BPT(£)  ^ 


+  '^'ixr'Hr{e)BE  (e){RE)  1{e)BET{£)'Hi{£) 


E M2^f3i5(£)(it-p)-1(£)f3EP(£)  53  m,34 

2=1  S=1 

-  Ml  ^  ^ryrBp (K£)(KRp)~1(e)BPT {e)  ^ 

r=l  s=l 

k  k 

+  Ml  E  Mr^SB(£)(f?B)-1(£)BET(£)  ^  Ms34 


2=2  r=iJ-^  J 

k  n  k 

+  J2  ^Ji{£)  ~  E  ^Tr  {^G(£)l^GT(£)}  .  (42) 
2=1  2  =  1 

It  is  now  necessary  to  exhibit  time  dependent  functions 
{£j (•)}*=!  and  {%(•)} 1  which  will  render  the  left  side 
of  (42)  equal  to  zero  for  £  £  [f0,f/],  when  {3^}.f=1 
are  evaluated  along  solution  trajectories  of  the  cumulant- 
generating  equations.  Studying  the  expression  (42)  reveals 
that  £)(•)  and  %(■)  for  1  <  *  <  k  satisfying  the  backward- 
in-time  differential  equations 

—£i(e)  =  AT  (£)TIi(e)  +  TIi(e)A(e)  +  Q(e) 

as 

k 

-  H1(e)Bp(e)(Rp)~1(e)Bpt(e)  53  m,Ws(£) 


+  E  -\c  _  -p 'Hj(£)G(E)WGT (£)Hi-j(£) ,  (44) 

j= iM  J)' 

^%{e)  =  ^{Hi{e)G{e)WClr{e)}  ,  1  <  *  <  k,  (45) 

will  work.  Furthermore,  at  the  boundary  condition,  it  is 
necessary  to  have  W  (to,  Ho,  T>o)  =  </>o  (fo,  Wo,  2?o),  or 
equivalently 

k  k 

X0  E  W(W<0  +  £i(to))  +)  +  53  Mi(^»0  +  %(to)) 

2=1  2=1 

k  k 

=  x0  E  ViKioXo  +  53  Mi^io  • 

2=1  2  =  1 

Thus,  matching  the  boundary  condition  yields  the  corre¬ 
sponding  initial  value  conditions  £i(to)  =  0  and  %(t o)  =  0 
for  the  equations  (43)-(45).  Applying  the  feedback  gains 
specified  in  (40)  and  (41)  along  the  solution  trajectories  of 
the  equations  (32)-(33),  these  equations  become  Riccati-type 
equations 

^-Hi{e)  =  -At{e)Hi{e)  -  Hi(£)A(e)  -  Q{e) 
as 

k 

+  W1(£)BP(£)(f?P)-1(£)BPT(£)53  %Hs(£) 


-53  ^r'Hr(£)BP(£)(RP)  1(£)BPT(e)H1(£) 


+  53MrW,.(£)f?P(£)(f?P)  1(e)BPT(e)Hi(e) 


+  Hi(e)BE(e)(RE)-1{e)BET(e)  J2  KMe) 


-  H1(e)BE(e)(RE)-1(e)BET(e)  53  %Hs{£) 


+  53  'fir'Hr{£)BE (e)(RE)  1{e)BET  {£)Hi(b) 


-  53  fir'Hr(e)BE(£)(RE)  1{e)Bet{e)H1{e) 


+  '^2jlrHr(E)BP(E)(RP)  l{£)BPT{£)y^JlsHs{£)  ~53  MrWr(£)BP(£)(i?P)  1  (£)BPT(£)53  M«WS  (£) 


-53  TlrRr{£)BE{£){RE)  1(£)Bpt(£)53ms^s(£)  >  +53  V‘r'Hr{e)BE {e){Re)  1(£)Bpt(£)53msWs(£), 


(43) 


(46) 


and,  for  2  <  i  <  k 

-r-Wi(e)  =  ~AT(£)Hi(£)  -Hi(£)A(£) 
d£ 

k 

+  Hi(e)Bp(e){Rp)~1(e)BPT(e)  ^  fisHsie) 

s= 1 
k 

+  ^  (e)  Bp  (e)  (Rp) ~ 1  (e)BPT  (e)Hi(e) 

r= 1 

k 

-  Hi{e)BE (e){REy1(£)BET (e)  ^  /xsHs(e) 

S  =  1 

k 

-  ^2  ilrnr (e)BE(£)(RE)~1{£)BET(£)'Hl{£) 

r—1 

-  E  -u2-  ■Me)G(e)WGT(e)Hl-j(e) ,  (47) 

3  =  J>' 

together,  for  1  <  i  <  k 

=  -Tr  {Ht(e)G{s)WGT{e)}  (48) 

where  the  terminal  conditions  =  Qf,  =  0  for 

2  <  i  <  k  and  V, (tf)  =  0  for  1  <  i  <  k.  Thus,  whenever 
these  equations  (46)-(48)  admit  solutions  {W* (■)}*_ x  and 
{T’i(-)}i=i-  then  the  existence  of  {£»(■)}£=  i  and  {^i(-)}?=i 
satisfying  the  equations  (43)-(45)  are  assured.  By  comparing 
equations  (43)-(45)  to  those  of  (46)-(48),  one  may  recognize 
that  these  sets  of  equations  are  related  to  one  another  by 

-r-£»(e)  =  --t- W»(e )  and 
d£  d£  d£  d£ 

for  1  <  i  <  k.  Enforcing  the  initial  value  conditions  of 
£i(t0)  =  0  and  %(to)  =  0  uniquely  implies  that 

£i{e)  =  Hi(t0)  -  Hi{e)  and  %{e)  =  T>i(t0)  -  T>i{s) 

for  all  £  £  [to,tf]  and  yields  a  value  function 

w(£,y,z)  =  v(e,y,z) 

k  k 

=  xp  ^  HiHi{t0)x0  +  ^2  0) , 


and  /r  =  >  0}^=1  with  fii  >  0.  Then  the  saddle-point 

solution  is  achieved  by  the  non-inferior  strategy  gains 

k 

Kp*(a )  =  -( Rp)-1{a)BPT{a)J2$rK{a ) ,  (51) 

r—1 

k 

KE*(a )  =  ( RE)~\a)BET(a )  ^  rH*(a ) ,  (52) 

r=  1 

where  additional  parametric  design  freedom  'pr  mutually 
chosen  by  rival  teams  represent  different  levels  of  influence 
as  they  deem  important  to  the  global  cost  distribution  and 
{TL*{a)  >  0}Jj=1  solve  the  coupled  differential  equations 


-  [. A{a )  +  Bp(a)Kp*(a)  +  BE(a)BE*(a)]7 '  H\{a) 

-  H\(a)  [. A(a )  +  Bp(a)I<p*(a)  +  BE{a)BE*{a)] 

-  Q(a)  -  KP*T (a) Rp (a) Kp* (a) 

+  KE*T (a)RE (a) KE* (a) ,  H\{tf)  =  Qf  (53) 
and,  for  2  <  r  <  k  with  =  0 


-  [A(a)  +  Bp (a)Kp*(a)  +  BE(a)KE*(a)\ T  H*r{a) 

-  U*r{a)  [A(a)  +  Bp (a)Kp* (a)  +  BE{a)KE*(a)] 

~E  u  '  ,]K(<*)G(a)WGT(a)H*r_s(a) ,  (54) 
'  5!(r  —  s)! 
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V.  Conclusions 

This  paper  dealt  with  a  multi-player  pursuit-evasion  dif¬ 
ferential  game  modeled  in  a  stochastic  environment  for 
realistic  conditions.  Matrix  differential  equations  for  generat¬ 
ing  statistics  of  the  standard  integral-quadratic  performance- 
measure  used  in  this  game  were  derived.  A  direct  dynamic 
programming  approach  was  used  to  solve  for  saddle-point 
solutions  that  can  address  both  control  strategy  selection  and 
performance  analysis  aspects.  Hopefully,  these  results  will 
make  some  new  theoretical  contributions  and  performance 
analysis  tools  to  stochastic  differential  game  communities. 
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